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Abstract: Evidence theory by Dempster-Shafer for determination of hormone receptor status in 
breast cancer samples was introduced in our previous paper. One major topic pointed out here 
is the link between pieces of evidence found from different origins. In this paper the challenge of 
selecting appropriate ways of fusing evidence, depending on the type and quality of data involved is 
addressed. A parameterized family of evidence combination rules, covering the full range of potential 
needs, from emphasizing discrepancies in the measurements to aspiring accordance, is covered. The 
consequences for real patient samples are shown by modeling different decision strategies. 


Keywords: evidence theory; theory of belief functions; Dempster-Shafer theory; breast cancer; 
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1. Introduction 


The Dempster-Shafer theory of evidence (DST) is a generalized framework in proba- 
bility theory. First introduced by Dempster between 1966 [1] and 1968 [2] in the context 
of Bayesian inference [3], Shafer perpetuated his ideas into a comprehensive theory in a 
book in 1976 [4]. A short summary of DST with an illustrative example of how to create 
and combine pieces of evidence was given in 1986 by Zadeh [5]. 

In 1988 Smets [6] (Chapter 9) framed the concept of credibility in terms of mathematical 
logic. In contrary to Shafer [4] he propagated the “open-world assumption”, thus the 
possibility of outcomes beyond the “frame of discernment” (e.g., example of broken coin). 
At the same time, in 1988, Dubois and Prade [7] gave an axiomatic description of how to 
define and combine pieces of evidence mathematically. 

One common approach to DST is via the “transferable belief model” (TBM), which 
Smets introduced in 1990 [8]. In the TBM evidence is fully described by “basic belief masses” 
(BBM). Sometimes the BBM is alternatively called “basic belief assignment” (BBA) [9]. The 
open world-assumption is achieved by assuming a positive value for the BBM of the empty 
set, as discussed in 1992 [10]. Conditioned belief and plausibility were embedded into a 
generalized Bayesian theorem in 1993 [11]. A procedure for a two-step decision making 
process within the TBM was outlined in 1994 [12]. In the first step evidence is based on 
belief functions as defined in DST and is called “credal” level. The following is a reduction 
to general probability functions, which are then used for decision making. This step is 
called “pignistic” level. 

Among others, an important elaboration of DST is given by the “Theory of Hints”, 
which was outlined by Kohlas in 1995 [13]. In 1991, Gebhardt [14] introduced the “context 
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model” to distinguish between vagueness and uncertainty which also covers topics such 
as refinement and coarsening. The “Dezert-Smarandache theory” (DSmT) [15] specifically 
targets the problem of imprecise, uncertain, and highly conflicting sources of data for 
information fusion. 

While in probability theory the calculus with probabilities is inherent, in DST the 
exertion of influence between pieces of evidence opens a wide field of facilities. Combining 
two pieces of evidence to improved evidence is, in general, accomplished by evidence 
combination rules (ECR). There is certainly a large variety of meaningful ECRs. Dempster’s 
original suggestion, which distributes inconsistent BBMs equally among others, was the 
most obvious Dempster ECR [4]. The rule is commutative and associative, but fails when 
sources of evidence become incompatible or conflicting. To overcome the problem of 
combining strongly contradicting pieces of evidence, Yager 1987 [16] suggested assigning 
inconsistent BBMs to the BBM of total ignorance. A good summary of about ten popular 
ECRs is given therein [17,18]. More sophisticated are “Proportional Conflict Redistribution” 
rules (PCR) [19,20] within DSmT or some of the improvements made [21]. 

As an alternative to Dempster’s ECR and to overcoming the problem of conflicting 
evidence, Shafer suggested [18] to manipulate mass functions by weighting and discounting 
(“belief in belief”) rather than to diversify the ECR itself. In DSmT [15] evidence from 
several origins can additionally be weighted by importance. 

The TBM gives a procedure of how to convert evidence into probabilities. However, 
there is a push for decision making based on evidence. An axiomatic approach was given 
in 1990 [22]. The inclusion of loss functions for classification were discussed in 1997 [9]. A 
review of decision-making strategies based on the theory of Neumann and Morgenstern 
from 1943 [23] (60th anniversary reprint [24]) was given in 2019 in Denceux [25]. 

There are many use cases for DST. An obvious one can be found in robotics for 
“Simultaneous Localization and Mapping” (SLAM) by combining data from different 
sensors [26]. Every sensor serves as an agent and is source of a piece of evidence. Decisions 
are based on the linkage of these pieces of evidence by ECR. 

Another important application is in combining classifiers as outlined in 2002 by Al- 
Ani [27]. Elements of a mostly high-dimensional feature space are to be classified into 
a number of labeled categories. In general, this will require random forest classification 
or the like. A possible approach via DST will consider the set of labeled categories to 
represent a frame of discernment. Each classifier (for each feature vector individually) is 
then transformed into a single piece of evidence by assigning a BBM to all subsets of the 
appropriate labels. The conjunction of classifiers is again accomplished by linkage of these 
pieces of evidence using customized ECRs [28]. 

The described procedure is very close to our approach with the crucial difference, that 
in our model not the full feature space is mapped to categories, but leaves the option of a 
feature vector being mapped to an additional category labeled as “undecidable”. 


2. Materials and Methods 
2.1. Dempster-Shafer Theory 


Evidence theory by Dempster-Shafer (DST) is based on combining pieces of evidence 
rather than dealing with probabilities. An evidence can be seen as a generalization of a 
probability function. The essential difference is, that while in the former the sample space 
, is mapped to probabilities Pr(a),a € O, in DST the power set of the sample space, now 
called “frame of discernment” (FOD), P(Q.) = 2° is mapped to masses m(A), A C Q. The 
mass function m(A) assigns basic belief masses (BBM) to the elements A € P(Q.) and 
can be interpreted as degrees of trust in some proposition A. Figure 1 shows this for a 
sample space respectively FOD with the three possible outcomes Blue, Red and Green, thus 
O. = {B,R,G}. 
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(a) (b) 


probability evidence 


m:2°++ [0,1]: So m(A = 


ACQ 


Figure 1. Graphical comparison between probability and evidence: (a) a distribution with probability 
function Pr(a). (b) an evidence represented by a mass functions m(A). Note, with three overlapping 
disks due to the lack of degrees of freedom (5 instead of 6) not all possible constellations can be 
graphically represented. 


The function m(A) :P(Q) — [0,1] satisfying )},4cq m(A) = 1 represents an evidence 
and, in return, every evidence is represented by such a function. Special care has to be 
taken for the the empty set m(@) of BBM. If m(®) = 0, an evidence is called normalized. 
A closed vs. an open FOD are referred to respectively a closed world vs. an open world 
assumption, see Figure 2. 


(a) (b) 


closed world open world 


Q={B,R,G},  m(0)= Q={B,R,G,...}, 


Figure 2. Closed world vs. open world assumption: (a) In a closed world no mass is given to the 
empty set, thus no outcome beyond () is possible. (b) In an open world a basic belief mass is given 
to the empty set allowing the ability to consider completely unexpected events to the model (e.g., 
broken coin) or to deal with data of low quality. 


We currently restrict ourselves to normalized evidence, but we will discuss the origin 
and opportunities of open world models later in the context of evidence combination rules 
(ECR) and vague FOD. 

For every set S € P(Q) the mass function m(A) intrinsically defines two essential 
quantities of DST, the “Belief” and the “Plausibility” of the set S. 


Bel(S) = )” m(A) PI(S)= 0 m(A) (1) 


ACS ANSZO 


This is why DST is also called the theory of belief functions. 

For a normalized evidence we have Bel(Q) = PI(QO) = 1 and Bel(®) = PI(@) = 0. 
This implies that we are sure that the correct answer lies somewhere within (closed 
world). For better understanding, Figure 3 shows Believe Bel(S) and Plausibility Pl(S) for 
two elements of P(Q.), namely {R} and {B,G}. 
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Bel({R}) PI({R}) Bel({B, G}) PI({B, G}) 


Figure 3. Belief Bel(S) and Plausibility Pl(S) illustrated by overlapping disks. The size of arrays 
represents basic believe masses. The difference between plausibility and believe Pl(S)\Bel(S) is 
called uncertainty. (a) Bel{ R} and Pl{ R} of the singleton {R}. (b) Bel{ B, G} and Pl{ B, G} of the set 
{B,G}. 


2.2. Evidence Combination Rules 


One strength of DST is the flexibility in combining pieces of evidence with various 
ECRs in adjustment of necessities. We will show how to take advantage of this by customiz- 
ing ECRs, depending on the origin of the data. Put simply, an ECR is a binary operator © 
that combines two mass functions m (A) and m2(A) associated with two pieces of evidence 
with a third mass function m(A), representing a fused evidence. 


m(A) = m(A) @mp(A) (2) 


Basically, such an operator does not need to fulfill any properties, except VA € P(Q) : 
m(A) € [0,1] and Yacg m(A) = 1. 

Most ECRs are commutative, but only in very rare cases are they associative, not even 
pseudo-associative in terms of [17]. There is a neutral element m,(A), called the “vacuous 
mass function”, satisfying m(A) = me(A) 6 m(A) = m(A) © me(A) with me(O) = 1 and 
me(A) = 0 for A # QO. The evidence associated with me¢(A) is also called “total ignorance”, 
representing the lack of knowledge. Note that m(A) = m(A) 6 m*(A) does not necessarily 
imply that m*(A) is a vacuous mass function. A counterexample is given in [29]. 

Obviously, an inverse function m~!(A) with m(A) @ m~!(A) = m7!(A) @m(A) = 
Me(A) for every m(A) does not necessarily exist. This is easy to understand when con- 
sidering that for a given evidence represented by a mass function m(A) it is unlikely to 
find more evidence which results in total ignorance. In general, some knowledge brought 
together with some other knowledge cannot end in knowing nothing. In algebraic terms, 
the set of all possible m(A) therefore has the structure of an unital magma [30]. 

An easy way to combine pieces of evidence is by simply multiplying the intersecting 
mass functions [8]. 


m(A)®m2(A) = y" m,(B)mp2(C) (3) 
BOC=A 


This ECR is called the “conjunctive” rule [26] and is fully compatible with the open 
world assumption in the TBM framework. Unfortunately, the resulting mass function 
m(A) = m,(A) © m2(A) is not normalized, so m(@) 4 0. Figure 4 illustrates the Formula (3) 
for two different cases in mosaic plots, the left one with rather consistent evidence, the 
right one with rather contradictory evidence (for details see Appendix A). 

The 49 rectangles within the two squares are colored in the color of the corresponding 
intersect, where the white areas are masses for contradicting evidence. Areas of the same 
color are added. Note that the mosaic plots in Figure 4 can be seen as an operation table 
for the operator M, thus e.g., {R} 9 {B, R} = {R}, and so on. For a closed world, the white 
areas must be redistributed among all others. 
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Figure 4. Intersection of basic belief masses of two pieces of evidence. The seven colors in the 


mosaic plot represent (in order) the seven sets {B}, {R}, {B, R}, {G}, {B, G}, {R, G}, {B, R, G}. The 
49 rectangles within the two squares are colored according to the intersect. White represents the 


(b) 


empty set ©. For a closed world, the white areas must be redistributed among all others. (a) rather 
consent pieces of evidence. (b) rather contradicting pieces of evidence. 


How to distribute the mass of the empty set m(@) among all other masses m(A) 
depends on the needs of the model. If there is a high chance of contradiction between 
m,(A) and m2(A), most of m(®) will be allocated to m(QO.). Contrarily, if there is a low 
chance of contradiction, m(©) will be distributed equally along the singletons m({a}), 
a € (). As discussed in the introduction, there is a wide range of possible allocations to 
do so. 

For a model to distinguish between hormone receptor statuses it is convenient to 
use a parameterized family € = {®,},e(9,1) of ECRs, which is very similar to the one 
introduced [31], but uses a parameter A € [0,1] to customize local requirements. Given two 
mass functions m,(A) and m2(A) we define 


0 A= 
m(A) =my(A) ®,m(A) = 4 yigcea sy ACO (4) 
1— Visca m(S) A=Q 


The parameter A in Formula (4) provides flexibility to adapt to circumstances. The 
restriction to A < 1 is motivated by restricting ourselves to an interpolation type ECR. The 
value A > 1 would yield an extrapolation type ECR as described in [31]. 

Dempster’s original ECR [4] is equivalent to setting A = 1. This ECR is associative 
and commutative. Unfortunately, it turns out that this particular ECR causes significant 
problems when given pieces of evidence that are rather contradictory [4]. The reason for 
this is that only the non-contradictory intersect between the two concatenated pieces of 
evidence is used for the evaluation of the new masses. If this intersect is small, the re-scaling 
due to normalization blurs out information. 

In contrast, Yager [16] distributes all contradicting mass to m(Q) which is equivalent 
to setting A = 0. For most applications this approach is too conservative and hinders 
merging similar evidence to a stronger evidence. However, if pieces of evidence originate 
from different types of sources this ECR could be very helpful. 

Depending on the relation between the agents, different values of A will be adequate. 
For pieces of evidence tending to contradict one another, such as combining gene expression 
with immunohistochemical measurements (IHC), a small value of A will be favored. For 
pieces of evidence with low probability of being contradictory, such as combining gene 
expression from a receptor gene with the co-gene, a greater value of A will better allow 
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consolidation evidence gained by gene expression. In any case, we should always avoid 
giving too much weight to any element of P(Q.), especially singletons . 

Another major benefit of introducing A is when combining different receptor statuses 
to one hormone receptor status. We found that this operation can also be represented by 
some adequate elements ©, € €. Our data suggests a value of A ~ 0.5 as optimum for this 
task. An illustrated example of evidence linkage and the implications of A can be found in 
Appendix A. 


3. Results 
3.1. Model Adaptation 


For hormone receptor determination the FOD is restricted to two outcomes, hormone 
receptor positive and hormone receptor negative. We will assume a closed world, thus 
there are no other possible outcomes than the two elements of ©. 


Oo = {+,-$ (5) 


The simplicity of this model allows us to describe all BBMs by using only two parame- 
ters, « and B. 


m{t+})=a my-j)=B mi+,—-})=m(O)=1-a—p (6) 


The current model involves 6 independent data sources to generate evidence. Four of 
them originate in gene expression, two in IHC measurements. The gene expression data 
consists of normalized values for the abundance of estrogen, co-estrogen, progesterone 
and co-progesterone, where the co-genes are genes closely related to the receptor genes 
themselves. How to transform gene expression data into BBM given by texpr and Bexpr is 
the subject of our previous papers [32,33]. 

IHC data originates in the IHC-measurements of estrogen and progesterone receptors. 
These measurements can be either continuous or discrete (or even missing). How to 
transform this data into appropriate aj,-. and Bip, is also previously discussed [32,33]. 

Putting these together into our existing model, the BBM mporm describing the evidence 
of the hormone receptor status is calculated as 


Tr Tr Tr 
Mtrorm = ( (msi. By mS) Gq mt) @ ( (mE, Sr mES") Go mis) 
mies @ mPB™ (7) 


Missing data are represented by the vacuous mass function. On the basis of the 
Formula (4) 61 stands for Dempster’s ECR and &o stands for Yager’s ECR respectively. The 
operator & does not represent a typical ECR, but a formal procedure reflecting common 
clinical decision making as given in the below Formula (8). 


Xhorm = Mborm({t+}) 
(m™" @ mPE)({+ f) 
= max(m™"({+}),mPe({+})) 
Max (esr, Mper ) (8) 
Bhorm = Mbhorm({—}) 
= (mS @ mPB)(L—}) 
min(m™"({— }),mP8"({—})) 


= min(Besr, Bpgr) 


However, this model suffers from a couple of shortcomings. The following list of 
improvements addresses the problems and provides credible results. 
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e The operator &, as defined in Formula (8), is not fully compatible with DST. There is 
always a dependence between the two receptor status. However, DST in its original 
form requires independent BBMs. This is obviously not the case for estrogen and 
progesterone receptors. Our suggestion to absorb this correlation is to replace © by 
0.5 giving estrogen and progesterone a balanced contribution to both BBM. 

e The operator ©; for combining pieces of evidence coming from gene expression and 
co-gene expression might be problematic in case of conflicting expression values. In 
a previous paper [32] we introduced mass limits @ and f for the BBMs to tackle this 
issue. We retain these mass limits, but replace ©; by &g9 as an additional reinsurance. 

e Combining gene expression evidence with IHC evidence, the operator @po will in 
case of conflict put too much weight into the mass of ignorance, m(Q.). Therefore we 
suggest slightly increasing A and replacing So by So. On the lower end of the A- 
range, the influence of A on the ECR is significantly less than on the upper end. As long 
as there is a profound confidence in the data, particularly in the IHC measurements, 
replacing Go by e.g., Go,3 is therefore also an option. 


e In the past it turned out that the optimal choice for the co-gene of progesterone is 


mostly estrogen itself. If so, although ms, and m®5" are calculated differently and so 


vary numerically, they are basically generated from the same gene expression data. A 
preferable assumption in DST is the independence of input data to generate evidence. 
In contrary to estrogen, progesterone expression data is often diffuse and it might be 
impossible to find a decent co-gene. This issue can be easily resolved by replacing 
m®S" with the vacuous mass function. Currently, for the sake of consistency, we stick 
to the current configuration which uses estrogen as co-gene for progesterone. 


Respecting all these issues above we suggest an improved model such as 


Tr Tr Tr 
Mhorm = Ge Bo9 Mee ) Bor ms ) Bos (mb Bog mee) Bor mye ) (9) 


The operators ©p,; and @o.9 are small derivations from to the original model and 
mainly serve to increase prediction stability in the case as described in [5]. Graphic examples 
are shown in Figures 5 and 6. 


(a) 


CSTexpr 
™— 
Do.9 
CSI co PD 


P&T ihc 


~—_ ° 
00.9 G wa 
_— ™— 
P8l oo o.1 
~ 


Figure 5. Contradictory data inducing undecidable outcome: (a) model (9) illustrated by a sample 
with indecisive outcome (sample id = 881). Both IHC measurements esrj,, and pgr,,. are receptor 
negative, but three out of 4 pieces of evidence based on gene expression indicate a receptor positive 
status. Red areas represent masses a for positive hormone status, blue areas represent masses Bh 
for negative hormone status, centers represent masses 6 = 1 — « — Bf for O = {+,—}. (b) choosing 
inappropriate A for the ECRs results in dubious prognosis. 
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Figure 6. Consistent data inducing reliable outcome: (a) model (9) illustrated by a sample with 
very clear outcome (sample id = 1980). (b) When evidence is highly consistent, the parameter 1 has 
practically no influence on the results. 


A further detailed explanation is required for the shift from ® to 95. For a sample 
to be receptor positive, only one of the two receptors (estrogen OR progesterone) needs 
to be positive while for being negative both receptors (estrogen AND progesterone) have 
to be negative. Therefore, the operator & as given in Formula (8) will fail and produce a 
misleading shift towards hormone receptor positive. If one of the two receptors has medium 
evidence for being positive and the other receptor has strong evidence for being negative, 
the operator © will still result in an evidence favoring a positive outcome hormone receptor. 

Moreover, there is a strong connection between the two receptor genes. A progesterone 
positive sample will almost always be estrogen positive while an estrogen negative sample 
is very likely to be progesterone negative. However, the approach given in Formula (8) is 
based on the assumption of almost independent receptors. 

Combining receptor evidence with 0,5 will, on the other hand, fix the above issues. 
In both cases it is still very likely that the hormone receptor status concluded from the 
evidence will be classified as “undecidable”, but in case of misclassification the probability 
of erroneously positive classified samples will be reduced by a large amount. This is in line 
with clinical demands. 


3.2. Examples 


The data set for the following results consists of 2559 freely available breast cancer 
samples from the Gene Expression Omnibus [34]. For each sample, at least one IHC 
measurement of a hormone receptor was performed as part of the respective study. Details 
can be found in Appendix B. 

In the first example (sample id 881 from the data set), gene expression data and IHC 
measurements are contradicting each other. In addition, gene expression of progesterone is 
not very accurate, and can be seen from the differing measurements. This leads to a final 
very diffuse evidence, and therefore no decision can reliably be made. 

Figure 5a shows the evolution of evidence for this particular sample. Figure 5b shows 
the importance of choosing the right A for the ECRs. The example shows that merging gene 
expression evidence with unclear IHC evidence can result in a dubious prognosis when 
choosing a too large A. 

In the second example (sample id 1980 from the data set), there is strong conformity 
in the data. Although one IHC measurement is missing, pieces of evidence accumulate to 
a strong belief in hormone receptor positive, see left panel in Figure 6. The right panel in 
Figure 6 demonstrates that in case of consistent evidence the influence of the parameter A 
can be neglected. 

The last example (sample id 2365 from the data set) is a very contradictory example 
concerning the data at hand. Large amounts of the final mass are distributed to lack of 
knowledge, which can be seen from the large central circles in Figure 7. The influence of A 
can change from case to case. 
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Figure 7. Contradicting data inducing uncertainty: (a) model (9) illustrated by a sample with very 
contradictory input data (sample id = 2365). (b) even when setting all A = 0.5 in the ECRs no 
conclusive evidence is generated. Nevertheless, the influence of A can change case by case. 


3.3. Analysis 


As can be seen in Table la and Figure 8, the switch from our previous approach 
(Formula (7)) to an improved linkage between the two hormone receptors (Formula (9)) 
entails a shift towards receptor negative. This is reflected by 86 samples clinically classified 
as “uncertain”, now being classified as “receptor negative” and 78 samples clinically 
classified as “receptor positive”, now being classified as “uncertain”. This shift can be 
quantified by a Cohen’s x = 0.877. Using a constant A for ECRs instead of (9) only has an 
influence on numerically problematic samples, as can be seen in Table 1b. 

The left panel of Figure 8 shows the change in the a (red dots) and B (blue dots), while 
the right panel illustrates Table 1 in an alluvial diagram. 


Table 1. Clinical decision making vs. flexible risk: (a) change from clinical decision making to 
Formula (9), « = 0.877, there is a trend towards receptor negative (upper triangular matrix). (b) an 
influence of A is only given for numerically problematic samples, « = 0.966. 


(a) 


flexible risk 
pos unc neg 
pos 1287 78 1 
clinical unc 3 51 86 
neg 0 0 1013 


(b) 


A = 0.5 constant 


pos unc neg 
pos 1268 22 0 
flexible risk unc 14 107 8 


neg 0 yi 1098 
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(b) 
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Figure 8. Illustration of the shift towards hormone receptor negative outcome by an improved linkage 

between hormone receptors (Formula (9)): (a) red dots are weights a for receptor positive, blue dots 

are weights 6 for receptor negative. (b) incorrect favoring of positive hormone receptor status has 


been revised by using ©o,5 instead of ®. 


3.4, Decision Making 

We will not change our strategy for decision making as proposed in our previous 
work [32,33]. This means, we consider an outcome A as “true” if the belief in it has more 
mass than the plausibility of its complement A’. Let 7 C P(Q) be the subset of all “true” 
elements of P(Q). 


AET <=> Bel(A) > PI(A’) A'=QO\A 
In this very simplified case with O = {+,—} it reduces to 
f+}EeT — =a>05 {-leT <=> p>05 
Note that QO € 7 will clearly always hold under the closed world assumption. 


4, Discussion 
4.1. Quality of Data 


In DST, the dogma “a (machine learning) model is only as good as the data it is fed” 
can be understood from a different perspective. This guiding principle is still valid, but 
lack of data quality can be coped with in the BBMs and ECRs by adequate parametrization. 
Here DST offers additional flexibility. 

In our model with only two possible outcomes this is simple. The best example is the 
modeling of the BBM for the IHC status. The less confidence there is in the data, the more 
mass is assigned to subsets of (1 with more than one element, i.e., blurred decisions. With 
increasing confidence in the IHC measurements, the corresponding singletons (i.e., crisp 
decisions) are more highly valued. 


4.2. From Data to Evidence 


This issue was the subject of our earlier papers [32,33] and we will therefore only 
briefly discuss it. Gene expression values are converted into BBM using logistic regression. 
In addition, two mass limits & and f are introduced for the following purposes. 

The most important is to consider the possibility of erroneous gene expression values 
by keeping masses significantly smaller than 1. A welcome side effect is to avoid some rare 
numerically problematic cases. 

The conversion of IHC measurements into BBM is again realized as described in [32,33]. 
We assume that about 85% of the IHC measurements are correct. 
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4.3. The Functionality of A in ®, 


We introduced the parameter A to specifically adapt decision strategy to the properties 
of data and its origin from which evidence is to be generated. The more data sources differ 
in nature, the smaller A should be chosen. This prevents too much mass accumulating in 
the singletons when mixing conflicting evidence. We call this strategy a conservative ECR. 

On the other hand, if the data sources are homogeneous, a large A can be chosen. We 
call this case a risky ECR. In the case of a risky ECR, care must be taken to ensure that 
contradictory singletons do not enter simultaneously with masses close to 1. In our model, 
this case is prevented by the mass limits & and B. 

Theoretically, it would also be possible to choose A > 1 as suggested [31]. That would 
correspond to an extrapolation in the sense that two consistent pieces of evidence not only 
increase certainty but also amplify each other to something stronger than the sum of them. 
However, our focus is in finding possible contradictions in data and therefore we see no 
point in merging evidence for hormone receptor status determination with A > 1. 

There is even more potential in the variation of A when combining the two hormone 
receptors, estrogen and progesterone. Our suggestion (Formula (9)) is choosing A = 0.5. By 
varying A in the linkage between the two hormone receptors, the amount of unclassified 
samples can be regulated conveniently. This is illustrated in Figure 9. 


(a) (b) 


0.08 1 

0.06 c Oe 
= co) 

G § 0.6 
9 0.04 = 
~” 

A 

: S° 

0.02 rs) 


= 
No 
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Figure 9. Uncertainty vs. flexible risk. (a) Increasing risk decreases uncertainty: The number of 
uncertain samples depends on the parameter A in combining the hormone receptors with m*** @ mP8*, 
The yellow area shows 45 out of 2519 samples (1.8%) which will always be uncertain, independent of 
the choice of A. The red area changes from uncertain to receptor positive with increasing A, the blue 
area changes into receptor negative. (b) fixed mass of ignorance: number of classifications by fixing 
the weight m(Q) = w as described by Formula (11). 


4.4, Training of A in @, on Real Data 


The parameter A is currently set according to intuitive arguments rather than strict 
mathematical rules. It would be interesting to investigate the existence of an algorithm to 
calculate A depending on arbitrary training data and to develop a mechanism that suggests 
an optimal choice. 

Due to a lack of clean training data of sufficient high quality we have done simulations 
to train A appropriately. It turned out that this task is far from trivial and needs further 
investigation. 


4.5. Enhanced Evidence Combination Rules 


There are many ECRs available and existing ones are constantly being developed 
further. However, none of these developments could state with certainty or explain con- 
clusively which ECR is beneficial for which application. Therefore we proposed to set the 
parameter A according to expert knowledge on the nature of the data. 

In any case, the parameterized ECR introduced in this way covers a very wide range 
of possible combinations of evidence. Unfortunately, it is difficult to assess whether 
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certain additional mathematical requirements for an ECR, such as commutativity, (pseudo-) 
associativity, idem-potency, invertibility or other characteristics of binary operators could 
provide additional value. 


4.6. An Evidence Combination Rule with Constant Ignorance 


The larger the expected contradiction between the two BBMs, the smaller one will 
usually choose 4. On the other hand, if expected contradictions are small, increasing risk 
may be taken by choosing a larger A. The question arises as to why one must choose A at 
all and not take a A adapted according to some formula. For example, the reciprocal value 
of the total mass could be used as the setting point: 


1 
*% Fpncao mi (Bym2(C) ” 


The net effect of this strategy would be to reduce the variability of m(QO.) within the 
samples. 

This approach can be pursued particularly elegantly if one assumes the resulting mass 
of total ignorance as an a priori constant, i.e., m(Q.) = w. The corresponding ECR would 
then read 


0 A= 2 
= _ 1—w) Vipace, ™1(B)m2(C 
m(A) = mA) @ame(A) =) rate eLoncaomismrcy ACO 1) 
Ww A= 


For the special case O = {+,—}, as valid for hormone receptor determination, this 
is directly leading to a + B = 1—w, with w representing the basic belief mass of total 
ignorance. 


4.7. Modified Frame of Discernment 


Our FOD consists of only two outcomes, “positive” and “negative”, ie.,O = {+,-}. 
In practice, however, the hormone receptor status is not solely responsible for the therapeu- 
tic decisions on therapy. There are different types of receptor-positive patients, and not all 
will respond equally well to hormone therapy. Therefore expanding the presented model 
at a later time is inevitable. There are two possible approaches to do so: 

the first is a refined model. Here, the refinement lies in subdividing “+” further into 
{+} = {4+1,+0}, ie, O = {+1,+0,—}, which can easily be arranged with part of the 
clinical data, since the ESR receptor status is often given as a (quasi-)continuous parameter. 

The second is to adapt the “open world assumption”. There are patients for whom 
it is basically impossible to make a serious choice for the most suitable treatment method 
based on the receptor status — even if it is measured precisely and reliably. The outcome for 
such patients is therefore not covered by the FOD. Such a model can be implemented by 
allowing a strictly positive BBM of the empty set, ergo m(®) > 0. 


4.8. Risk Function for Decision Making 


Finally, another open point is the need for a risk function. Wrong decisions regarding 
therapy are not symmetrical. Adjuvant chemotherapy is often vital, even if only hormone 
therapy is applied. Some preliminary investigations have already been carried out [25], but 
considering the specific case, it is still an open field of research. Creating such risk functions 
is a heavily investigated topic and we will come back to it in a succeeding paper. 


5. Conclusions 


Dempster-Shafer theory of belief functions represents a generalization of Bayes’ prob- 
abilities. It provides a powerful framework to proactively involve the outcome “uncertain” 
in case of insufficient data availability to make confident decisions. Instead of probabili- 
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ties pieces of evidence, represented by basic belief masses, are concatenated by evidence 
combination rules. 

In this paper, we presented a manner to parameterize evidence combination rules 
to adjust models to the nature of the incoming measurements. Data with high potential 
to be contradictory (like gene expression and immunohistochemical measurements) is 
linked in a more conservative manner than data which is more likely expected to be in 
consent. Thus, evidence theory avoids several well-known problems with decisions based 
on conventional statistics. 

As a major advancement, our work introduces flexible evidence-combination-rules 
offering the potential of adaptable risk. Changing the parameters for concatenating pieces 
of evidence (respectively data) alters the probability for a sample to be classified as “well- 
defined” or as “uncertain”. This is especially helpful to adapt the Dempster-Shafer algo- 
rithm to possible different types of risks and directly related possibilities of a treatment 
decision. Examples are possible over- and under-treatment of particular patients. 

To illustrate the strength of evidence theory we used a case study of hormone receptor 
status determined for breast cancer samples. As a key outcome we estimate that slightly too 
many patients have been classified as hormone receptor positive by conventional clinical- 
decision-making in comparison to our approach. We do not advocate overruling clinical 
decisions, but rather flagging questionable samples as “uncertain” and suggesting further 
investigations for these particular patients. 

Drawing on flexible evidence combination rules in our approach we see great potential 
for the advancement in personalized medicine. 
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Abbreviations 


The following abbreviations are used in this manuscript: 


BBM basic belief masses, same as basic belief assignment (BBA) 
BC breast cancer 

DSmT  Dezert-Smarandache theory 

DST Dempster-Shafer theory of evidence 


ECR evidence combination rule 

ESR estrogen 

FOD frame of discernment 

GEO Gene Expression Omnibus 

IHC immunohistochemistry 

PCR proportional conflict redistribution 


PGR progesterone 
SLAM — simultaneous localization and mapping 
TBM transferable belief model 
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Appendix A. Examples of Combining Pieces of Evidence 


For convenience we denote every element S € P(Q) by an integer a(S). Let OQ = 
{x9,X1,---Xn—-1} and S C O. We define 


a 
a(S) = Yo x5(xx)2" (Al) 
k=0 
with indicator function 
1 ifyeS 
_ A2 
xs(y) ify ¢S (A2) 


and identify m(S) = m(a(S)) = m(a). So m(5) would be the mass of {X9, x2}. 

For illustration of a general case we consider three mutually exclusive outcomes (e.g., 
treatments) QO = {B,R,G} and two pieces of evidence from agents E; and E2 with mass 
functions m;(a): A — [0,1], A = {1,2,...,7},7 © {1,2} in the notation above. Note that 
for this demonstration example we assume a closed FOD and therefore do not explicitly 
write down m (0) = m2(0) = 0 hereinafter. 


Appendix A.1. Two Rather Consistent Agents 


This case often occurs when data from similar sources are to be linked. This could be 
multiple measurements of the same parameter within a short time span, but also correlated 
genes from one gene expression chip. 

The example in Figure Al gives two agents strongly agreeing in Red. While the first 
one considers Blue as an alternative, the second one’s first alternative is the preferable 
Green. 


m(A) {0.262, 0.434, 0.130, 0.047, 0.038, 0.039, 0.050} 
m7(A) = {0.123,0.409,0.122, 0.187, 0.051, 0.042, 0.066} 
{0.138, 0.393, 0.032, 0.045, 0.007, 0.007, 0.378} A=0.1 
{0.166, 0.471, 0.038, 0.054, 0.009, 0.008, 0.254} A=0.5 
{0.207, 0.589, 0.048, 0.068, 0.011, 0.010, 0.067} A=0.9 


eS) 
SP) 
~ 
= 
ey 
S 
|| 


conservative medium risky 


®:@-@e®@ 


Figure Al. Adding two rather consistent pieces of evidence with Formula (4). A larger value of A 


(less conservative) seems to be preferable. Masses for ambiguous outcomes such as {R, B} almost 
vanish. The first and the second rows show different graphical representations of the same situation. 
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Appendix A.2. Two Rather Contradictory Agents 


When different types of data are to be linked, this case is likely to occur. In our example 
this is data from immunohistochemistry with data from gene expression. It will occur more 
often when evidence based on different opinions or measurement methods are to be linked. 

The example in Figure A2 shows two agents, where the first one strongly believes in 
Red, while the second one gives most mass to Blue or Green. 


m(A) = {0.262,0.434, 0.130, 0.047, 0.038, 0.039, 0.050} 
m3(A) =  {0.236, 0.085, 0.059, 0.378, 0.100, 0.059, 0.083} 
{0.203, 0.161, 0.023, 0.088, 0.013, 0.009, 0.504} A=0.1 
{0.260, 0.207, 0.029, 0.113, 0.016, 0.012, 0.363} A=0.5 
{0.365, 0.290, 0.041, 0.158, 0.023, 0.016,0.108} A =0.9 


= 
SP) 
oi 
S 
> 
|| 


conservative medium risky 


A = 0.5 A = 0.9 


A= il 
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Figure A2. Adding two rather contradictory pieces of evidence with Formula (4). A smaller value of 


A (more conservative) seems preferable staying on the safe side. Obviously none of the three possible 
outcomes receives the necessary support to represent a good choice. Contradicting masses should 
therefore be mostly allocated to total ignorance, m(7) = m(Q.) (compare Formulas (A1) and (A2)), 
displayed by the central grey disk. 


Appendix B. Data Description 


The data used for this study are identical to the data set used in our previous pub- 
lications and is described in detail in [32]. It consists of 3753 samples from 38 studies 
downloaded from the Gene Expression Omnibus (GEO) [34], including clinical parameters. 
Of these, 2559 samples were finally selected in which HER2 status could be determined to 
be negative with reasonable confidence. 

Clinical parameters and, in particular, immunohistochemical measurements were ob- 
tained either from the GEO database metadata directly or from the associated publications. 
All input parameters for our models were also subjected to a double plausibility check. 
Removal of non-biological batch effects in the gene expression data was done as part of a 
normalization process that included all studies simultaneously. 
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